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D scrlption 

Backaroup H of the Invention 



This invention relates to the management of digi- 
tized media stream data e.g.. digitized video, and par- 
ticularly relates to the capture, storage, distribution, ac- 
cess and presentation of digital video within a network 
computing environment. 

Extensive technological advances in microelectron- 
ics and digital computing systems have enabled digiti- 
zation of a wide range of types of information; for exam- 
ple digital representations of text, graphics, still images 
and audio are now in widespread use. Advances in com- 
pression, storage, transmission, processing and display 
technologies have recently provided the capabilities re- 
quired to extend the field of digitization to additionally 
include video information. 

Conventionally, digitized audio and video are pre- 
sented on, for example, a computer system or network 
by capturing and storing the audio and video streams in 
an interleaved fashion, i.e., segments of the two streams 
are interleaved. This requires storage of the digital audio 
and video in a single stream storage container and fur- 
ther requires retrieving chunks of interleaved audio and 
video data at an aggregate rate which matches the nom- 
inal rate of an active presentation sequence. In this way 
one unit of video (say. a frame) is physically associated 
in storage with one unit of audio (say, a corresponding 
33 msec clip), and the two are retrieved from storage as 
a unit. Sequences of such audio and video units are then 
provided to a presentation and decoder digital subsys- 
tem in an alternating fashion, whereby each audio and 
video unit of a pair is provided in sequence. 

Computer systems that provide this audio and video 
management functionality typically include digital com- 
pression/decompression and capture/presentation 
hardware and software, and digital management sys- 
tem software, all of which is based upon and depends 
upon the interleaved format of the audio and video 
streams it processes. 

Currently, handling of audio and video in a network 
environment is also based on a scheme in which cap- 
ture, storage, and transmission of audio and video must 
be carried out using interleaved audio and video 
streams. This interleaving extends to the transmission 
of audio and video streams across the network in an in- 
terleaved format within transmission packets. 

Synchronization of audio with video during an active 
presentation sequence is conventionally achieved by in- 
itially interleaving the audio and video streams in stor- 
age and then presenting audio and video chunks at the 
nominal rate specified for an active presentation se- 
quence. 

In "Time Capsules: An Abstraction for Access to 
continuous-Media Data," by Herrtwich. there is dis- 
closed a frame-work based on time capsules to describe 
how timed data shall be stored, exchanged, and ac- 



cessed in real-tim systems. When data is stored into 
such a time capsule, a time stamp and a duration value 
are associated with th data item. The time capsule ab- 
straction includes the notion f a clock for ensuring pe- 
5 riodic data access that is typical for continuous-media 
applications. By modifying the parameters of a clock, 
presentation effects such as time lapses or slow motion 
may be achieved. 

While the Herrtwich disclosure provides a time cap- 
10 sule abstraction for managing time-based data, the dis- 
ctosure does not provide any technique for synchroniz- 
ing time-based data based on the time capsule abstrac- 
tion, and does not address the requirements of time- 
based data management in a network environment. Fur- 
iB thermore, the disclosure does not address processing 
of time-based data streams as a function of their inter- 
leaved format or manipulation of that format. 

Further disclosures defining the general state of the 
art are to be found in IEEE Network: The Magazine of 
20 Computer Communications, vol. 4. No. 6. November 
1 990 - 'Network considerations for distributed multime- 
dia object composition and communication" by TD.C. 
Little et al, and IEEE Journal on Selected Areas in Com- 
munication, vol. 9. No. 9, December 1991 - "Multimedia 
2$ synchronization protocols for broadband integrated 
services' by TD.C. Little et al. 



SummarY ot the Invention 

30 The invention provides a computer-based media 
data processor in accordance with claims 15, and 12 
which follow. 

In general, in one aspect, the invention features a 
computer-based media data processor for controlling 
35 the computer presentation of digitized continuous time- 
based media data composed of a sequence of presen- 
tation units. Each presentation unit is characterized by 
a prespecified presentation duration and presentation 
time during a computer presentation of the media data 
40 and is further characterized as a distinct media data 
type. In the processor of the invention, a media data in- 
put manager retrieves media data from a computer stor- 
age location in response to a request for computer pres- 
entation of specified presentation unit sequences, and 
45 determines the media data type of each presentation 
unit in the retrieved media data. The input manager then 
designates each retrieved presentation unit to a speci- 
fied media data presentation unit sequence based on 
the media data type determination for that presentation 
50 unit. The input manager then assembles a sequence of 
presentation descriptors for each of the specified pres- 
entation unit sequences, each descriptor comprising 
media data for one designated presentation unit in that 
sequence, and each sequence of presentation descrip- 
55 tors being of a common media data type; and then as- 
sociates each presentation descriptor with a corre- 
sponding presentation duratran and presentation time, 
based on the retrieved media data. Finally, th input 
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manager links the presentation descriptors of each se- 
quenc to establish a progression o1 pr sentation units 
in that sequence. 

A media data interpreter of the invention indicates 
a start time of presentation processing of the presenta- s 
tion descriptor sequenc s, and accordingly, maintains a 
current presentation time as the sequences are proc- 
essed for presentation. The interpreter counts each 
presentation unit in the media data sequences after that 
unit is processed for presentation, to maintain a distinct io 
current presentation unit count for each sequence, and 
compares for each of the presentation unit sequences 
a product of the presentation unit duration and the cur- 
rent presentation unit count of that sequence with the 
current presentation time after each presentation unit ^5 
from that sequence is processed for presentation. 
Based on the comparison, the interpreter releases a 
presentation unit next in that presentation unit sequence 
to be .prpcessed for presentation vyheri the product 
matches the current presentation time count, and de- 20 
letes a presentation unit next in that presentation unit 
sequence when the product exceeds the current pres- 
entation time count. 

In general, in another aspect, the invention features 
a media data processor for controlling transmission of 25 
digitized media data in a packet switching network. Such 
a network comprises a plurality of client computer 
processing nodes interconnected via packet-based da- 
ta distribution channels. In the invention, a remote me- 
dia data controller receives from a client processing 30 
node a request for presentation of specified presenta- 
tion unit sequences, and in response to the request, re- 
trieves media data from a corresponding media access 
location. A remote media data input manager of the in- 
vention then determines the media data type of each 3S 
presentation unit in the retrieved media data, and des- 
ignates each retrieved presentation unit to a specified 
media data presentation unit sequence based on the 
media data type determination for that presentation unit. 
Then the input manager assembles a sequence of pres- 40 
entalion descriptors for each of the specified presenta- 
tion unit sequences, each descriptor comprising media 
data for one designated presentation unit in that se- 
quence, and all presentation descriptors in an assem- 
bled sequence being of a common media data type. The 4S 
interpreter associates each presentation descriptor with 
a corresponding presentation duration and presentation 
time, based on the retrieved media data; and finally, 
links the descriptors in each assembled sequence to es- 
tablish a progression of presentation units in each of the so 
specified presentation unit sequences. 

A remote network media data manager of the inven- 
tion assembles transmission presentation unit packets 
each composed of at least a portion of a presentation 
descriptor and its media data, all presentation descrip- ss 
tors and media data in an assembled packet'being of a 
conrimon media data type; and releases the assembled 
packets for transmission via the network to the client 



processing nod r questing pres ntation of the sp ci- 
fied presentation unit sequences. 

A local media data contr Her of the invention trans- 
mits the pr s ntatk>n unit sequence requ st to the re- 
mote media data controll r from the client proc ssing 
node, and controls starting and stopping of s quence 
presentation in response to user specifications. 

A local network media data manager of the inven- 
tion receives at the client processing node the transmis- 
sion presentation unit packets via the network, and des- 
ignates a presentation unit sequence for each presen- 
tation descriptor and its media data in the received pack- 
ets to thereby assemble the presentation descriptor se- 
quences each corresponding to one specified presen- 
tation unit sequence, all presentation descriptors in an 
assembled sequence being of a common media data 
type. Then the local network media data manager links 
the de^iptors in each assembled sequence to estab- 
lish a progres^SfStn of .presentation units for each of the 
presentation unit sequences. 

A local media data interpreter of the invention ac- 
cepts the assembled presentation descriptor sequenc- 
es one descriptor at a time and releases the sequences 
for presentation one presentation unit at a time. In this 
process, the local interpreter indicates a start time of 
presentation processing of the sequences, and accord- 
ingly, maintains a current presentation time as the de- 
scriptor sequences are processed for presentation. 
Based on the presentation duration of each presentation 
unit, the interpreter synchronizes presentation of the 
specified presentation unit sequences with the current 
presentation time. 

In preferred embodiments, the specified media data 
presentation unit sequences comprise a video frame se- 
quence including a plurality of intracoded video frames; 
preferably, each frame of the video frame sequence 
comprises an intracoded video frame, and more prefer- 
ably, the video frame sequence comprises a motion 
JPEG video sequence and an audio sequence. In other 
preferred embodiments, each of the plurality of intrac- 
oded video frames comprises a key frame and is fol- 
lowed by a plurality of corresponding non-key frames, 
each key frame including media data information re- 
quired for presentation of the following corresponding 
non-key frames. 

In other preferred embodiments, synchronization of 
presentation of the specified presentation unit sequenc- 
es is accomplished by the local media data interpreter 
by comparing for each of the presentation descriptors 
in each of the presentation descriptor sequences the 
presentation time corresponding to that descriptor with 
the currently maintained presentation time. Based on 
this comparison, the interpreter releases a next sequen- 
tial presentation unit to be processed for presentation 
when the corresponding presentation time of that de- 
scriptor matches the current presentation time, and de- 
letes a next sequential presentation unit to be processed 
for presentation when the current presentation time ex- 
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ceeds the corresponding presentation time ot that de- 
scriptor. 

In other prelerred embodiments, synchronization ot 
presentation ot the specified presentation unit sequenc- 
es is accomplished by the local media data interpreter 
by counting each presentation descriptor in the se- 
quences after that presentation unit is released to be 
processed for presentation, to maintain a distinct current 
presentation unit count for each sequence. Then, the 
Interpreter compares for each of the presentation unit 
sequences a product of the presentation unit duration 
and the current presentation descriptor count of that se- 
quence with the currently maintained presentation time 
after a presentation unit from that sequence is released 
to be processed for presentation. Based on the compar- 
ison, the interpreter releases a next sequential presen- 
tation unit in that presentatbn unit sequence when the 
product matches the currently maintained presentation 
time, and deletes a next sequential presentation unit in 
that presentation unit sequence when the product ex- 
ceeds the currently maintained presentation time. 

In other preferred embodiments, the remote media 
data controller of the invention receives from the local 
media data controller, via the network, an indication of 
a specified presentation data rate at which the specified 
presentation unit sequences are to be transmitted via 
the network to the client node. The media data retrieved 
comprises a plurality of storage presentation unit se- 
quences stored in a computer storage location, each 
storage presentation unit sequence composed of pres- 
entation units corresponding to a specified presentation 
unit sequence and all presentation units in a storage 
presentation unit sequence being of a common media 
data type. The remote media data input manager des- 
ignates each of a portion of the presentation unit de- 
scriptors as the descriptor sequences are assembled, 
the portion including a number of descriptors based on 
the specified presentation data rate, each designated 
descriptor comprising null media data, to thereby com- 
pose the presentation descriptor sequences with only a 
portion of storage presentation unit media data. With 
this designation, the specified presentation unit se- 
quences attain the specified presentation data rate of 
transmission. 

In the invention, the separation of media streams 
and distinctly formatting of network transmission pack- 
ets for each stream provides an opportunity and the fa- 
cility to examine, process, and make transmission deci- 
sions about each stream and each presentation unit in- 
dependent of other streams and presentation units. As 
a result, the media processor of the invention can make 
presentation decisions about a given presentation unit 
independent of the other units in the corresponding 
stream, and can make those decisions "on-the-flyV This 
capability provides for real time scaling and network 
load adjustment as a stream is retrieved, processed, 
and transmitted across the network. 

Further aspects, features, and advantages of the in- 



vention are set forth in the following specification and 
the claims. 

Brief Deseriotl n of the Drawing 

5 

Fig. 1 is a schennatic diagram of media stream ac- 
cess and delivery points with which the digital vide 
management system of the invention may interface; 
Fig. 2 is a schematic diagram of a stand-alone im- 
70 plementation of the digital video management sys- 
tem of the invention; 

Fig. 3 is a schematic diagram of a netvwork imple- 
mentation of the digital video management system 
of the invention; 
15 Fig. 4 is a schematic diagram of the local digital vid- 
eo management system manager modules of the 
invention; 

Fig. 5 is a schematic diagram illustrating the flow ot 
media stream data between the stream I/O nr^nag- 
20 er and stream interpreter modules of the local digital 
video management system manager of Fig. 4; 
Fig. 6 is a schematic flow chart illustrating presen- 
tation and capture scenarios carried out by the local 
digital video management system manager of Fig. 
25 4; 

Fig. 7 is a schematic illustration of the translation 
from media stream storage format to token format 
carried out by the local digital video management 
system manager of Fig. 4; 
30 Fig. 8 is a schematic flow chart illustrating presen- 
tation and capture scenarios carried out by a digital 
video system used in conjunction with the local dig- 
ital video management system manager scenarios 
of Fig. 6; 

35 Fig. 9 is a schematic diagram of the local digital vid- 
eo management system manager and the remote 
digital video management manager modules of the 
invention in a network implementation; 
Fig. 10 is a schematic diagram illustrating the flow 
40 of media stream data between the remote and local 
digital video management manager modules of the 
invention in a network implementation; 
Fig. 11 A is a schematic flow chart illustrating pres- 
entation and capture scenarios carried out by the 
45 remote digital video management system manager 
of Fig. 9; 

Fig. 11 B is a schematic flow chart illustrating pres- 
entation and capture scenarios carried out by the 
local digital video management system manager of 
50 Fig. 9; 

Fig 12 is a schematic illustration of the translation 
of stream tokens of Fig. 7 into packet format. 



55 



Deecription of a Preferred Emb dlment 

Referring to Fig. 1 , there is illustrated the digital vid- 
eo management system (DVIVIS) 10 of the invention. 
The DVf^S provides the ability to capture, store, trans- 
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mit, access, process and present live or stored media 
stream data, Independent of its capture or st rage loca- 
tion, in either a stand-alone or a network environment. 
The DVMS accommodates media stream data, i.e. , con- 
tinuous, high data-rate, real-time data, including video, 
audio, animation, photographic stills, and other types oi 
continuous, time-based media data. Throughout this de- 
scription: the DVMS of the invention will be explained 
with reference to audio and video streams, but it must 
be remembered that any time-based media data stream 
may be managed in the system In the DVMS. as shown 
in Fig. 1 , media data may be accessed from, e.g.. live 
analog capture, analog or digital file storage, or live dig- 
ital capture from e.g.. a PBX (private branch exchange) 
server, among other access points. The accessed me- 
dia is managed by the DVMS for delivery to. e.g., a pres- 
entation monitor, a computer system for editing and 
presentation onthecomputer, a VGR tape-printer, or dig- 
ital storage, or sent to a PBX server. 

Of great advantage, the DVMS management 
scheme is independent of any particular storage or com- 
pression technology used to digitize the data streams, 
and further, is independent of any particular communi- 
cation protocols or delivery platform of a network in 
which the DVMS is implemented Additionally, the 
DVMS is industry standards-based yet is flexible and 
standards-extensible, via its layered architecture, which 
incorporates multiple management platforms. Each of 
these features and advantages will be explained in de- 
tail in the discussion to follow. 

Digital Video Management System Components 

The DVMS of the invention is based on a technique 
whereby media data streams are handled and managed 
as distinct and separate media data streams in which 
there is no interleaving of media data. Here the term 
"stream" is meant to represent a dynamic data type, like 
video, as explained above, and thus, a stream consists 
of dynamic information that is to be produced and con- 
sumed in a computer system or network with temporal 
predictability A stream contains a succession of se- 
quences. Sequences can themselves contain sequenc- 
es; in turn, each sequence contains a succession of seg- 
ments. Streams, sequences and segments, as informa- 
tion Identifiers, have no media type-specific semantics. 
Rather, they are convenient abstractions for specifying 
and organizing dynamic data types to be managed by 
the management system of the invention. An easily un- 
derstood analogy to streams, sequences and segments 
is that of documents containing chapters, sections and 
sentences. 

Streams are characterized by their media data type, 
e.g., audio, video, or animation data types. Sequences 
represent information that is meaningful to the user. For 
example, a video sequence may represent a video clip 
containing a video scene. Segments can be convenient 
"chunks" of data for editing and mixing that data. Seg- 



ments nnay also represent units of data that are tempo- 
rally linked, as when using a vide compression scheme 
that produces key video frames and corresponding fol- 
lowing difference video frames. 

5 In the DVMS of the inv nti n, streams that are in- 
tended for synchronous presentation can be grouped in- 
to a stream group of distinct constituent streams (i.e.. 
without interleaving). Although constituent streams in 
such a stream group may be stored in an interleaved 

10 form within a storage container, the DVMS can dynam- 
ically coordinate separately stored streams; in either 
case, the system processes the streams distinctly rath- 
er than in an interleaved fashion. 

Segments of streams contain presentation units. A 

75 presentation unit Is a unit of continuous, temporally- 
based data to be presented, and accordingly, has an as- 
sociated presentation time and presentation duration. A 
presentation time indcates the appropriate point in the 
sequence of a presentation at which -the^associated 

20 presentation unit is to be played, relative to a time base 
for the ongoing presentation. A presentation duration in- 
dicates the appropriate Interval of time over which the 
associated presentation unit is to be played in the on- 
going presentation. Thus, a video presentation unit 

2S comprises a video frame, and an audio presentation unit 
comprise a number of sound samples associated with 
a frame duration. 

As mentioned above, the DVMS may be implement- 
ed in a stand-alone computer system or a computer- 

30 based, packet switched network. Referring to Fig. 2, in 
a stand-alone computer system implementation 12, live 
or stored media streams are accessed and captured for 
presentation and editing on the stand-alone computer 
14. The captured, and optionally edited media streams 

35 may then be delivered to a presentation monitor or to a 
VCR tape printer utility. 

Referring to Fig. 3, a packet switching network in 
which the DVMS is Implemented comprises desktop 
computer systems 18 which are linked via a packet 

40 switching network 30. which is controlled by the DVMS 
network implementation 16. The network 80 may com- 
prise a local area network (LAN) or a wide area network 
(WAN), or a combination of one or more LANs and 
WANs. The DVMS provides access to and capture of 

45 media streams from live analog video capture, e.g., a 
VCR or camcorder, a network, storage or PBX server, 
or one of the desktop computers, and in turn manages 
the transmission of the media stream data across the 
network back to any of the access points. 

50 The digital video management system consists of a 
local DVMS manager and a remote DVMS manager. 
The local DVMS manager provides a client operating 
environment, and thus resides on a stand-alone com- 
puter or each client computer in a network, "client" here 

55 being defined as a computer system or one of the ac- 
cess points in a network that request media data; the 
remote DVMS manager provides a network operating 
environment, and thus resides on a network s rver Th 
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local DVMS manager may be implemented n. f r ex- 
ample. IBM-compatible personal computers running Mi* 
crosoft® Windows^**, to thereby provide high-level, in- 
dustry-standard access to underlying digital video sen/- 
ices. This local DVMS rr^nager implementation may s 
support, for example, the industry -standard Microsoft® 
digital video MCI API for application development. The 
local DVMS manager incorporates an efficient data-flow 
subsystem, described below, that is highly portable to 
other operating systems. io 

The DVMS system of the invention is preferably im- 
plemented as an application programming interface 
suite that includes interfaces for a computer program- 
ming application to include media data stream manage- 
ment capability within the application. Thus, the DVMS is 
interfaces with an underlying programming application 
via interface calls that initiate media data stream func- 
tions within the realm of the programming application. 
Such an interface implementation will be understanda- 
ble to those skilled in the art of C programming. 20 

The remote DVMS manager acts to dynamically link 
a client and a server in the packet network environment. 
The architecture of this manager has the important ad- 
vantage of supporting the ability to scale distinct, non- 
interleaved media data streams, as discussed in depth 2S 
below. This ability to scale packet-based video, thereby 
creating scalable packet video, is a facility which permits 
adaptive bandwidth management for dynamic media 
data types in both LANs and WANs. The remote DVMS 
manager may be implemented as a Netware© Loadable 30 
Module, on, for example, the Novell Netware© operating 
system. 

Local DVMS Manager 

3S 

The local DVMS manager manages the access and 
capture of media data streams transparently, i.e.. with- 
out impacting the functionality of the application pro- 
gram which requested that access and capture. The lo- 
cal DVMS manager works with a digital video system, 
implemented either in special purpose digital video 
hardware or in special purpose software-based emula- 
tion of the digital hardware. 

Referring to Fig. 4, the local DVMS manager 20 
consists of three modules: the -stream controller 24, ^5 
stream input/output (I/O) manager 26, and the stream 
interpreter 28. This modularity is exploited in the DVMS 
design to separate the flow of data in a media data 
stream from the flow of control information for that media 
stream through the system. Based on this data and con- so 
trol separation, stream data and stream control informa- 
tion are each treated as producing distinct interactions 
among the three manager modules, which operate as 
independent agents. The I/O manager, interpreter and 
controller agents are each mapped via the local DVMS ss 
manager to independently scheduable operating sys- 
tem processes with independent program control flow 
and data space allocation. The flow of media stream da- 



ta is managed by the stream I/O manager 26 and the 
stream interpret r 28, while the flow of control informa- 
tion is managed by the str am controller 24. Each of 
these manag ment functions is explained in detail be- 

lOW- 

The stream I/O manager module 26 is responsible 
for the dynamic supply of media data streams, e.g., au- 
dio and video streams, from or to the stream interpreter. 
This module also provides efficient file format handling 
functions for the media data., if it is accessed via a stor- 
age file, e.g., a DVl® AVSS file. In a stand-alone imple- 
mentation of the DVMS of the invention, the stream I/O 
manager provides retrieval and storage of media data 
streams from or to points of media access, such as dig- 
ital or analog storage containers, while in a network im- 
plementation of the DVMS. as described below, the re- 
mote DVMS manager modules provide retrieval and 
storage at points of media access via the network. Most 
importantly, the stream I/O manager performs a trans- 
lation from the representation of audio and video infor- 
mation as that information is stored to the corresponding 
dynamic computer-based representation. This transla- 
tion is explained in detail below. 

The stream interpreter module 28 is responsible for 
managing the dynamic computer-based representation 
of audio and video as that representation is manipulated 
in a stand-alone computer or a computer linked into a 
packet network. This dynamic management includes 
synchronization of retrieved audb and video streams, 
and control of the rate at which the audio and video in- 
formation is presented during a presentation sequence. 
In addition, the stream interpreter module manages the 
capture, compression, decompression and playback of 
audio and video information. This module is. however, 
compression technology-independent and additionally 
is device-independent. Base services of a digital video 
subsystem, including, for example, hardware for cap- 
ture and presentation functions, are preferably imple- 
mented to be accessed through a standard API suite of 
digital video primitives, which encapsulate any functions 
unique to a particular compression or device technolo- 
gy. 

The following suite of primitive functions provide de- 
vice-independent access to the base services of a digital 
video subsystem: 

Open; Open a specified device, initialize it, and re- 
turn a handle for further requests; 
Close. Close a specified device and free up any as- 
sociated resources; 

Get_CapabHities: Query a device's capabilities, e. 
g., display resolutions, compression format, etc.; 
Start: Start decoding and displaying data from a 
stream buffer; 

Sfop.- Stop decoding and displaying data from a 
stream buffer; 

Getjnfo: Gel information about the current status 
of a device; 
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S9tJnfo: Set information in the device attributes. 

The stream controller module 24 is responsible f r 
the control of video and audio capture and playback 
functions during user-directed applications. This control 
includes maintaining the dynamic status of video and 
audio during capture or playback, and additionally, pro- 
viding presentation control functions such as play, 
pause, step and reverse. This nrKxjule is accordingly re- 
sponsible for notifying an active application of stream 
events during audio and video capture or playback. An 
event is here defined as the current presentation unit 
number, for which an indication would be made, or the 
occurrence of the matching of a prespecifled presenta- 
tion unit number with a current presentation unit number. 

During the active playback of audio and video, or 
other dynamic media data streams, the stream I/O man- 
ager and the stream interpreter act as-the tima-based 
producer and consumer respectively, of the data 
streams being played back. Conversely, during record- 
ing of a dynamic data stream, the stream interpreter acts 
as the time-based stream producer and the stream I/O 
manager acts as the time-based stream consumer. Dur- 
ing both playback and recording, the I/O manager and 
the interpreter operate autonomously and asynchro- 
nously, and all data in an active stream flows directly 
between them via a well-defined data channel protocol. 
The stream controller asynchronously sends control 
messages to affect the flow of data between the I/O 
manager and the interpreter, but the controller does not 
itself participate in the flow of data. As discussed below, 
all data flow operations are handled using a minimal 
number of buffer copies between, for example, a disk or 
network subsystem and the digital video capture and 
presentation hardware. 

This system design is particularly advantageous in 
that it provides tor complete transparency with respect 
to the domain of the I/O manager and the interpreter, 
thereby providing the ability to extend the system to a 
network client/sen/er configuration, as explained below. 
Moreover, this basic three-agent unit may be concate- 
nated or recursed to form more complex data and con- 
trol functionality graphs. 

In the architecture of the local DVf\^S manager, the 
activity of one of the asynchronous agents, each time it 
is scheduled to run while participating in a stream flow, 
is represented as a process cycle. The rate at which an 
asynchronous agent Is periodically scheduled is repre- 
sented as the process rate for that agent, and is meas- 
ured as process cycles per second. A process period is 
defined as the time period between process cycles. In 
order to maintain continuous data flow of streams be- 
tween the stream I/O manager and the stream interpret- 
er, the limiting agent of the two must process a process 
period's worth of presentation units within a given proc- 
ess cycle. In cases in which such process rates are not 
achieved, the local DVMS manager can control the flow 
rate, as explained below. The process rate for the 
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stream Interpret r Is ck)se to the nominal presentation 
rate of the stream, i.e.. In every process cycle, a pres- 
ntation unit is processed. The stream I/O manager 
services several presentation units in every proc ss cy- 

s cle and thus, its process rate may be much low r than 
the pr sentation rate. 

The modularity of the stream control functions pro- 
vided by the stream I/O manager, Interpreter and con- 
troller make the local DVMS manager architecture of the 

10 DVMS highly portable to most modem computer oper- 
ating systems which support preemptive multitasking 
and prioritized scheduling. This architecture also pro- 
vides for selective off-loading of the stream I/O manager 
and interpreter modules to a dedicated coprocessor tor 

IS efficient data management. Most importantly, the highly 
decentralized nature of the manager architecture allows 
it to be easily adapted to LAN and WAN systems, as 
discussed below. 

Referring to Fig. 5, when a computer implemented 

20 with the DVMS of the invention requests access to audio 
or video streams, the following stream flow occurs. The 
stream I/O manager 26 module retrieves the requested 
streams from a stream input 30; this stream input com- 
prises a storage access point, e.g., a computer file or 

25 analog video source. The stream I/O manager then sep- 
arates the retrieved streams according to the specified 
file format of each stream. If two streams, e.g., audio 
and video streams, which are accessed were inter- 
leaved in storage, the stream t/O manager dynamically 

30 separates the streams to then transform them to distinct 
intemal representations, each comprising a descriptor 
which is defined based on their type (i.e. audio or video). 
Once separated, the audio and video stream data are 
handled both by the stream I/O manager and the stream 

35 interpreter as distinct constituent streams within a 
stream group. The stream I/O manager 26 then ex- 
changes the stream data, comprising sequences of 
presentation units, with the stream interpreter 28 via a 
separate queue of presentation units called a str am 

40 pipe 32, for each constituent stream; an audio stream 
pipe 33 is thus created for the audio presentation units, 
and a video stream pipe 31 is created for the video pres- 
entation units. Each audio stream (of a group of audio 
streams) has its own pipe, and each video stream has 

45 its own pipe. During playback of streams, the stream 1/ 
O manager continually retrieves and produces presen- 
tation units from storage and the stream interpreter con- 
tinuously consumes them, via the stream pipes, and de- 
livers them to a digital media data subsystem for, e.g.. 

50 presentation to a user. 

When retrieving a plurality of streams from an Input 
30 in which the streams are separated (not interleaved), 
the stream I/O manager retrieves and queues the 
streams' data in a round robin fashion, but does not per- 

55 form any stream separation function. The str am inter- 
preter processes these streams in the same manner as 
it processes those which are originally, interleaved. 
Thus, the stream I/O manager advantageously shi Ids 
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the remainder of the system from the natur of the static 
container 30, and further "hides" the format f the st r- 
age container, as well as the way that logically coordi- 
nated data streams are aggregated lor storage. Addi- 
tionally, the details of the sUeam interpreter implemen- 
tatbn, such as its hardware configuration, are "hidden" 
from the I/O subsystem; in fact, the only means of com- 
munication between the two agents is via the well-de- 
fined stream pipe data conduits. 

Referring also to Fig. 6. during a presentation sce- 
nario, the stream controller 24 first initializes 36 the 
stream I/O manager 26 and stream interpreter 28. by 
creating active modules of them to begin processing 
streams, and then defines and indicates 38 a stream 
group and the corresponding constituent stream names. 
The stream I/O manager 26 then retrieves 40 the named 
streams from corresponding storage containers 30 and 
separates the streams, if stored in an interleaved fash- 
ion If they were not interleaved, the streams are re- 
trieved in a round-robin fashion. Once the streams are 
retrieved, the stream I/O manager converts 42 the 
streams to an internal computer representation of 
stream tokens, described below. Via the stream group 
indication 30. each stream token is identified with a 
stream and a stream group by the indication provided 
to the stream I/O manager by the stream controller. The 
I/O manager then buffers 44 the streams separately, 
each in a distinct stream pipe 32 for consumption by the 
stream interpreter 28; the stream controller provides 
control 48 of the steam group as it is enqueued. 

Referring also to Fig. 7, the I/O manager stream 
translation 42 from storage representation to stream to- 
ken representation is as follows. Typically., audio and 
video data is stored in an interleaved fashion on a disk 
and so upon retrieval are in an interleaved disk buffer, 
as in the IntelCg) AVSS file format. The disk buffers 100 
consist of a sequence of stream group frames 105. each 
frame containing a header 1 06. a video frame 1 08. and 
an audb frame 110. A separate index table (not shown) 
containing the starting addresses of these stream group 
frames is maintained at the end of a file containing these 
frames. This index table permits random access to spe- 
cifically identified stream group frames. 

The disk buffers are retrieved by the I/O manager 
from the disk in large chunks of data, the size of each 
retrieved chunk being optimized to the disk track size, 
e.g.. 64 K bytes each. The I/O manager examines each 
retrieved stream group frame header and calculates the 
starting addresses of each audio and video frames with- 
in the stream group frame. It also retrieves the time 
stamp information from the corresponding frames. A 
linked list of descriptors, called tokens 112. is then gen- 
erated for the audio and video frames; each token rep- 
resents an audio or video presentation unit 114 and the 
time stamp 116 for that unit. These tokens are continu- 
ously linked into a list representing the stream pipe. 
Thus, in the process described above^ the stream I/O 
manager retrieves interleaved data from a disk, sepa- 



rates the data into distinct streams, and constructs an 
internal r presentation of separated streams based on 
separate stream pipes, on for each, stream. 

Once the streams are enqueued in the str am 
5 pip s. the stream int rpreter 23^ having been initialized 
36 by the stream controller 24, accepts and d queues 
48 the constituent stream tokens of presentation units. 
The debuffered streams are then scaled 50 and syn- 
chronized 52, based on control via the stream controller, 
10 which maintains 54 the status of the stream group. The 
scaling process will be described in detail below. Th 
synchronized streams are then delivered to the digital 
presentation subsystem hardware. 

The decompression scheme is based on the partic- 
is ular compression fonmat of video frames, e.g.: the mo- 
tion JPEG video format. This format is one of a preferred 
class of video formats, in which each frame is intracod- 
ed. i.e., coded independently, without specification of 
other frames. 

20 Referring to Fig. 8. the digital video system 1 20 re- 
ceives streams from the stream interpreter and first de- 
codes and decompresses 122 the stream data, each 
stream being processed separately. The decoded and 
decompressed data streams are then stored 1 24 in cor- 
25 responding frame buffers, e.g.. video and audio frame 
buffers. At the appropriate time, the stored data is con- 
verted 126 from its digital representation to a corre- 
sponding analog representation, and is delivered to a 
playback monitor and audio speakers. The various op- 
30 erations of the digital hardware subsystem are control- 
ted by the stream interpreter via digital video primitives, 
as explained and described previously. 

In the reverse operatbn, i.e., capture and storage 
of digital video and audio streams being processed by 
35 a computer system, the stream interpreter 28 captures 
the audio and video streams from the digital hardware 
subsystem 120. Before this capture, the hardware sub- 
system digitizes 128 the audio and video signals, stores 
130 the digitized signals in a buffer, and before passing 
40 the digitized streams to the stream interpreter, com- 
presses and encodes 1 32 the video and audio data. 

Based on the stream group control provided by the 
local stream controller, the stream interpreter generates 
62 time stamps for the captured streams and using the 
45 time stamps, creates 64 corresponding stream tokens 
ot video and audio presentation units with embedded 
time stamps. The stream tokens are then enqueued 66 
to stream pipe9^32 for consumption by the stream I/O 
manager 26. 

50 The piped streams are accepted and dequeued 72 
by the stream I/O manager 26, and then scaled. If the 
streams are to be stored in interleaved form, they are 
then interleaved 76, in a process which reverses the 
functionality depicted in Fig. 7. The streams are not re- 
55 quired, of course, to be stored in such an interleaved 
form. Once the streams are interleaved, if necessary, 
the streams are stored in a corresponding storage con- 
tainer 30. 
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Each of the functions of the stream controller, 
stream I/O manager, and stream int rpreter described 
in these scenarios may be implemented in hardware or 
software, using standard design techniques, as will be 
recognized by those skilled in the art. 

Synchronization of Audio with Video 

As mentioned in the presentation process de- 
scribed above, the digital video management system of 
the invention provides synchronization of audio to video, 
and in general, synchronization between any two or 
more dynamic streams being presented. This synchro- 
nization function is inherently required for the coordinat- 
ed presentation of multiple real-time, continuous, high 
data-rate streams in a stream group. For example, the 
rGai-time nature of audio and video is derived from the 
presentation attributes of these dynamic data types, 
which have quite different presentation attributes; full 
motion video needs to be presented as 30 frames per 
second and high quality audio needs to be presented at 
32,000 samples per second. 

Furthermore, digital video and audio data streams 
have real-time constraints with respect to their presen- 
tation. The streams are usually continuous and last from 
30 seconds-long (clips) to 2 hours-long (movies). Addi- 
tionally, the streams typically consume from about 1 
Mbit/sec to 4 Mbit/sec of storage capacity and transmis- 
sion bandwidth, depending on the particular compres- 
sion technology used for digitizing the stream. Thus, 
synchronization of differing data streams must accom- 
modate the diverse temporal aspects of the streams to 
be synchronized. 

The synchronization capability of the digital video 
management system of the invention is based on self- 
timing, and accordingly, self-synchronization, of data 
streams to be synchronized. This technique accommo- 
dates independent handling of multiple data streams 
which are together constituent streams of a stream 
group, even if the stored representations of the constit- 
uent streams are interleaved; the stream I/O manager 
separates interleaved streams before the stream inter- 
preter synchronizes the streams. Alternatively inde- 
pendent constituent streams may, however, be stored 
in separate file containers and be synchronized, before 
presentation, with a common reference time base. 

Self -synchronization also provides the ability to pri- 
oritize one constituent stream over other streams in a 
stream group. For example, an audio stream may be 
prioritized over a video stream, thereby providing for 
scalable video storage, distribution and presentation 
rates, as discussed below. This feature is particularly 
advantageous because human perceptbn of audio is 
much more sensitive than that of video. For accurate 
human perception of audio, audio samples must be pre- 
sented at a smooth and continuous rate. However, hu- 
man visual perception is highly tolerant of video quality 
and frame rate variation; in fact, motion can be per- 
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ceived even despite a wide variation in video quality and 
fram rate. Empirical evidence shows that humans can 
perceiv motion If the pr sentation rate is betwe n 15 
and 30 frames/sec. At lower frame rates motion is still 
5 perceivable, but artifacts of previous motions are notice- 
able. 

The DVMS of the invention exploits this phenome- 
non to optimally utilize available computing, compres- 
sion and network resources; by prioritizing the retrieval, 

10 transmission, decompression and presentation of audio 
over video within a computer system or network com- 
puting environment, and by relying on audio-lo-video 
synchronization before presentation, rather than at stor- 
age, an acceptable audio rate can be maintained while 

IS at the same time varying the video rate to accommodate 
resource availability in the system or network. Addition- 
ally independent management of audio and video data 
sUeams provides many editing capabilities, e.g., th 
ability to dynamically dub a video stream with multipfe 

20 audio language streams. Similarly, the synchronized 
presentation of an audio stream with still pictures is pro- 
vided for by the independent stream management tech- 
nique. It must be remembered that all of the synchroni- 
zation schemes described are applicable to any type of 

25 stream, not just audio and video streams. 

As described above with reference to Fig. 6, the 
synchronization of streams within a stream group is the 
responsibility of the stream interpreter module during a 
scaling process. The streams may be self -synchronized 

30 using either an implicit timing scheme or an explicit tim- 
ing scheme. Implicit timing is based on the fixed p rio- 
dicity of the presentation units in the constituent streams 
of a stream group to be synchronized. In this scheme, 
each presentation unit is assumed to be of a fixed du- 

35 ration and the presentation time corresponding to each 
presentation unit is derived relative to a reference pres- 
entation starling time. This reference starting time must 
be common to alt of the constituent streams. Explicit tim- 
ing is based on embedding of presentation time stamps 

40 and optionally, presentation duration stamps, within 
each of the constituent streams themselves and retri v- 
ing the stamps during translation of streams from the 
storage format to the token format. The embedded time 
stamps are then used explicitly for synchronization of 

45 the streams relative to a chosen reference time base. 
Using either the implicit or explicit timing self-syn- 
chronization schemes, a reference time base is ob- 
tained from a reference clock, which advances at a rate 
termed the reference clock rate. This rate is determined 

so by the reference clock period, which is the granularity of 
the reference clock ticks. 

The DWMS of the invention supports two levels of 
self-synchronization control, namely a base level and a 
flow control level. Base level synchronizatbn is applica- 

55 ble to stream process scenarios in which the stream 1/ 
O manager is able to continuously feed stream data to 
the stream interpreter, without interruption, and in which 
each presentation unit is available before it is to be con- 
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sumed. In this scenari . then, the stream 1/0 manager 
maintains a process rate and a process work load that 
guarantees that the stream I/O manag r stays ahead of 
th stream interpreter 

The flow control level of synchronizatbn is a modi- 
fication of the base level scheme that provides a recov- 
ery mechanism from instantaneous occun'ences of 
computational and I/O resource fluctuations which may 
result in the stream pipe between the stream I/O man- 
ager and the stream interpreter running dry. This could 
occur, for example, in a time-shared or multi-tasked 
computer environment, in which the stream I/O manag- 
er may occasionally fall behind the stream interpreter's 
demand for presentatbn units due to a contention, such 
as a resource or processor contention, with other tasks 
or with the stream interpreter itself. In such a scenario, 
the DVMS of the invention augments the base level of 
synchronization with a stream flow control function, as 
described below. 

Base Level implicit Timing Synchronization 

As explained above, the base level synchronization 
scheme assumes that there is no need for control of 
stream flow to the stream interpreter, and thus does not 
monitor for vacancy of the stream pipe. Implicit timing is 
based on a reference time base that is applied to each 
stream to be synchronized. 

Considering a scenario in which audio and video 
streams are to be synchronized, each presentation unit 
for the video stream to be presented might typically con- 
tain video information to be presented in a frame time 
of, e.g. , 33 msec, for NTSC video play The audio stream 
might typically be divided into fixed frames of presenta- 
tion time with marginally varying samples per presenta- 
tion unit. In a storage scheme in which the audio and 
video are interleaved, these fixed units of time are set 
as the time duration for a video frame, i.e., 33 msec. 

In this synchronization scenario, the stream inter- 
preter maintains a separate presentation unit counter for 
each stream pipe, and correspondingly, for each stream 
in the stream group. The interpreter consumes presen- 
tation units from the two streams in a round robin fash- 
ion, i.e., first one, then the other, and so on. Importantly, 
an independent presentation synchronization decision 
is made for each presentation unit, or token, of each 
stream, based on a corresponding reference time base, 
without regard to other streams. This reference time 
base indicates the current real time relative to the start 
time of the presentation unit consumption process for 
the corresponding stream. The stream counter of each 
stream pipe indicates the number of already consumed 
presentation units in the con-esponding stream. Multi- 
plying this count by the (fixed) duration of each of the 
presentation units specifies the real time which has 
elapsed to present the counted units. When this real 
tim product matches the current reference time, the 
next presentation unit is released for presentation. 



The stream interpreter initiates the consumption 
and pres ntatton of each presentation unit in sequence 
during its presentation process cycle based on a pres- 
entation decision scheme. This scheme implicitly as- 
5 sumes that th stream interpreter is scheduled such thai 
the interpreter process rate is very clos to the nominal 
presentation rate of the corresponding stream. This 
scheme is based on a comparison of a reference time 
base with the amount of time required to present the 
nunnber of already-consumed presentation units, and 
thus requires the use of counters to keep a count of pr s- 
entation units as they are consumed. 

Base Level Explicit Timing Synchronization 

As explained previously, in the explicit timing 
scheme, stream synchronizatbn is based on time 
stamps that are embedded in the corresponding 
streams' tokens themselves. The time stamps represent 
the time, relative to the reference time base, at which 
the corresponding audio or video presentation frames 
are to be consumed and presented. The time base may 
be, for example, an extemal clock, or may be generated 
from the embedded time base of one of the streams to 
be synchronized. The periodicity of the time stamps is 
itself flexible and can be varied depending on particular 
synchronization requirements. Time stamps may b 
embedded in the streams during capture and compres- 
sbn operations, as described above, or at a later time 
during, for example, an editing process. Independent of 
the process by which the time stamps are embedded in 
a stream, the stamps are utilized by the stream I/O man- 
ager and interpreter du ring playback processes to make 
the consumption and presentation decisions. The 
stream interpreter does not maintain a presentation unit 
counter in this scheme, as it does in the implicit timing 
scheme. Rather, the embedded time stamps in the 
streams provide equivalent information. 

A time stamp for a presentation frame token con- 
sists of two 32-bit integers representing the presentation 
time and the presentation duratbn for that presentation 
unit. The presentation time and the presentation dura- 
tion are represented in milliseconds. The presentation 
duration may be omitted if all presentation units are of 
the same duration. 

In this synchronization scheme, the interpreter 
reads the embedded time stamp of each presentation 
token, as that token is processed, to determine presen- 
tation time and duration for each presentation unit in the 
sequence. The interpreter decides on consumption and 
presentation of each presentation unit in each stream 
based on a decision scheme. This decision scheme is 
based on the assumption that the stream interpreter is 
scheduled such that its process rate Is very clos to the 
nominal presentation rate of the corresponding stream. 
This scheme is based on a comparison of a reference 
time base with the presentation time and presentation 
duration stamp embedded in each presentation unit. 
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When a presentation unif s stamp pres ntation time cor- 
responds to the reference time, that pres ntation unit is 
consumed for pres ntation. 

In addition to determining the appropriate time for 
releasing presentation units in the sequence, both the s 
implicit and explicit timing schemes delete presentation 
units if the appropriate release time for those units has 
passed. For example, in the implicit timing scheme, 
when the product of processed units and unit duration 
exceeds the currently maintained time count, the next io 
sequential unit is deleted, rather than presented. Simi- 
larly, in the explicit timing scheme, then the current pres- 
entation time exceeds the time stamp presentation time 
of a presentation unit, that unit is deleted, rather than 
presented. In this way, synchronization of streams is is 
maintained, even if units arrive for presentation at a later 
time than expected. 
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The flow control synchronization scheme augments 
the base level synchronization scheme to provide for re- 
covery from instantaneous computational and I/O re- 
source fluctuations during a consume and presentation 
process cycle. The base level scheme relied on the as- 2S 
sumption that the stream I/O manager stays ahead of 
the stream interpreter to keep steam pipes from becom- 
ing vacant, or running dry. Flow control synchronization 
guards against this condition using a scheme based on 
virtual presentation units. 30 

A virtual presentation unit is one which allows the 
underlying digital hardware subsystem to continue with 
a default presentation for the duration of a correspond- 
ing presentation unit, while at the same time maintaining 
a consistent internal state, to thereby provide sequential 3S 
processing of a stream that is being presented, even 
while the stream pipe is temporarily empty Virtual pres- 
entation units may be implemented in a variety of em- 
bodiments. For example, in the case of motion JPEG 
video, the playing of a virtual presentation unit would 40 
preferably correspond to redisplaying the most recent 
previous video frame. In the case of audio streams, a 
virtual presentation unit would preferably correspond to 
a null unit, i.e., a presentatbn unit consisting of null sam- 
ples that represent silence. Other virtual presentation 4S 
unit implementations are equally applicable. 

During a presentation process cycle using the flow 
control implicit timing scheme to synchronize stream 
flow, the stream I/O manager and stream interpreter per- 
form the same operations described above in the base so 
level scheme. As explained, the interpreter maintains a 
separate presentation unit counter for each stream with- 
in the stream group being presented, to keep track of 
the number of already-consumed presentation units in 
each stream. Multiplying this count by the duration of ss 
each presentation unit specifies the time at which, when 
matching the referenc time, the next presentation unit 
in the sequence is to be presented. The stream inter- 



preter decides on the consumption and presentation of 
each presentation unit based on a decision scheme 
which assumes that the interpreter is scheduled at a 
proc ss rate that is close to the nominal stream pr sen- 
tation rate. In this scheme, when the interpreter finds 
that a presentation token is not available from the 
stream pipe, and that the reference time and pres nta- 
tion unit count indicate that a presentation unit is need- 
ed, a virtual presentation unit is generated and con- 
sumed for presentation. 

Flow Control Level Explicit Timing Synchronization 

During a presentation process cycle using the ex- 
plicit timing synchronization mechanism augmented 
with flow control capability each presentation token in 
the stream group being presented is assumed to include 
its own embedded time stamp for presentatbn time and 
duration. . in the explicit timing scheme without flow 
control, the stream interpreter examines each embed- 
ded time stamp to decide on the consumption policy of 
the corresponding presentation unit in the stream pipes 
set up by the stream I/O manager. The consumption pol- 
icy is determined based on a decision scheme, which 
assumes, as did the other schemes, that the process 
rate of the stream interpreter is close to the nominal 
presentation rate of the corresponding stream. In this' 
scheme, when it is determined that another presentation 
unit is not available from the stream pipe and a unit 
should be presented, a virtual presentation unit is gen- 
erated based on a default presentation duration, and 
that unit is then consumed for presentation. 

Additionally, in the flow control schemes of either 
implicit or explicit timing, capability is provided to skip 
over presentation units. This capability is envoked 
whenever a previously unavailable presentation unit lat- 
er becomes available. In the explicit timing scheme, the 
time stamp of a later available unit will never match the 
reference time after the presentation of a virtual presen- 
tation unit, and thus that unit will never be presented, 
and will be discarded. In the implicit timing scheme, the 
presentation of a virtual presentation unit in place of an 
unavailable presentation unit advances the presenta- 
tion unit counter, as does any presented unit. When the 
unavailable unit is then later available, the presentation 
unit count will be advanced such that the product of the 
count and the fixed presentation unit duration will not 
permit presentation of that unit. 

Coding of the four synchronization processes de- 
scribed above into instructions suitable for implement- 
ing the synchronization techniques will be understand- 
able to those having ordinary skill in the art of C pro- 
gramming. 

Self-Svnchronization Features 

The four self-synchronization schemes described 
above provide several critical advantages in the digital 
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vide management scheme of the invention. Setf-syn- 
chronization accommodates the ability to dynamically 
associate distinctly stored streams with a common 
stream gr up. Thus, tor example, audio and video 
streams may be stored in separate file containers and 
grouped dynamically during retrieval from storage for 
synchronized presentation. As discussed above, this 
synchronization of constituent audio and video streams 
provides, for example, for the function of dubbing of vid- 
eo with audio, and synchronizing still video with audio. 
Additionally, using the stream synchronization tech- 
nique, stream segments from different file containers 
can be dynamically concatenated into one stream. In the 
case of explicit setf-synchronization, the stream I/O 
manager marks the first presentation unit in a stream 
segment with a marker indicating the start of a new 
stream segment. Then when the stream interpreter con- 
sumes this presentation unit, the interpreter reinitializes 
the reference time base for the corresponding stream. 

Self-synchronization further accommodates the 
ability to adapt to skews in the clock rates of audio and 
video hardware used to play audio and video streams 
which are being synchronized For example, an audio 
stream recorded at an 11. 22 or 33 KHz sampling rate 
must be played back at exactly the sampling rate for ac- 
curate audio reproduction. Similarly, a video stream re- 
corded at 30 frames per second must be played back at 
that same rate. The audio and video hardware playing 
these streams thus must each use clocks adapted for 
the particular play rate requirement of the corresponding 
stream. Any skew in the clock rates would cause drifting 
of the playing streams, and thus destroy synchroniza- 
tion of the streams, if the skew were to be uncorrected. 
Self -synchronization achieves this correction automati- 
cally using a reference time base which the audio and 
video time bases are checked against; the consumption 
rate of a stream is adjusted to drop presentation units 
periodically, if necessary, if a skew in one of the time 
bases, relative to its prescribed correspondence with 
the reference time base, is detected, thereby maintain- 
ing synchronization with respect to the reference time 
base and the other stream. 

The self-synchronization schemes provide the ca- 
pability to vary the inherent presentation rate of streams. 
For example, a video stream captured in PAL format, 
based on 25 frames per second, may be played in the 
NTSC format, whrch is 30 frames per second, albeit with 
some loss of fidelity In general, any stream may be 
played at a custom rate, independent of the rate at which 
the stream was captured. In fact, it is often desirable in 
video playback to either speed up or slow down the nom- 
inal presentation rate of the video. Using the self-syn- 
chronization technique, the video presentation rate may 
be, for example, sped up by a factor of 2 by simply ad- 
vancing the reference time base to twee the real time 
rate. Conversely, the presentation may be stowed by 
half by advancing the reference time base at one half 
the real time rate. In these cases, the total time elapsed 



for the presentation will be. of course, one half or twice 
the elapsed time for the presentation made at the nom- 
inal rate. 

5 Stream Scalability 

A scalable stream is a stream that can be played at 
an aggregate nominal presentation rate with variable 
data rates, under computer control. Of course, variation 
10 in the data rate may affect the quality, fidelity or presen- 
tation rate of the stream. The coupling of stream scala- 
bility with stream self-synchronization provides a pow- 
erful control mechanism for flexible presentation of au- 
dio and video stream groups. 
IS As discussed above, scalability allows the DVMS to 
optimize utility of computer system resources by adjust- 
ing stream rates according to utility availability In th 
case of audio and video streams, the stream interpreter 
may be programmed to give higher priority to audio 
20 streams than video streams, and thus consume audio 
presentation units at the nominal audio presentation 
rate, but consume video units at an available presenta- 
tion rate. This available presentation rate is determined 
by the available computational resources of a given 
25 computer system. Different computer systems having 
varying performance characteristics require differing 
amounts of time to accomplish presentation operations. 
Such operations involve decompression, format conver- 
sion and output device mapping. In particular, a com- 
30 pressed Motion JPEG video stream has to be Huffman 
decoded, DCT decompressed, converted to RGB color 
space, and mapped to a 256 color VGA palette by the 
digital hardware subsystem before presentation within 
an IBM PC-compatible personal computer system; dif- 
35 ferent computer systems require various time periods to 
accomplish these tasks. Thus, the management system 
of the invention adapts to any computer performance 
characteristics by adjusting the scale of the stream flow 
rate to accommodate the availability of utilities in that 
40 computer. 

Most importantly, the stream scalability feature of 
the digital video management system of the invention 
provides the ability to comprehensively manage distri- 
bution of digital streams over packet networks. The 
45 DVMS exploits this capability in a network embodiment 
providing management protocol schemes for client- 
server sessions, as well as management protocol 
schemes for storing, accessing, retrieving and present- 
ing streams over a LAN or WAN. The system thereby 
so accommodates on-demand retrieval and playback of 
stored streams, and injection and tapping of multicast 
live streams over packet networks. The managed digital 
streams may be stored in ordinary computer files on file 
servers, or may be generated from live analog sources 
55 and made accessible over a LAN or WAN. Such access 
may be on-demand, as mentioned above, as in r trieval 
and presentatbn from a stored file, or on-schedule. as 
in injection and tapping from a broadcast channel. The 
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management protocol schemes provided by th DVMS 
will be fully described below. 

Referring now t Fig. 9, in a network implementa- 
tion, the local DVMS manager 20 accesses digital media 
streams located elsewhere in the network via the remote s 
DVMS manager 82 of the manag ment system; the lo- 
cal DVMS manager provides a client operating environ- 
ment, while the remote DVMS manager provides a net- 
work operating environment. Via the network 80, the lo- 
cal DVMS manager 20 and the remote DVMS manager 
82 transmit control messages and digital media data 
streams as they are requested by a computer client con- 
nected in the network. 

Remote DVMS Manager 

The remote DVMS manager 82 manages network 
control of digital media streams via four independarit 
modules, namely, a remote stream controller 84, a re- 
mote stream input/output (I/O) manager 86, a remote 
network stream I/O manager 88, and a local network 
stream I/O manager 90. 

In this DVMS network implementation, the local 
DVMS manager 20, residing locally to a client computer 
in the network, comprises a local stream controller 24, 
local stream I/O manager 26 and local stream interpret- 
er 28. The local network stream I/O manager 90 of the 
remote DVMS manager directly interfaces with the local 
DVMS manager locally. 

The remote stream controller 84 resides on a re- 
mote storage device or access point, e.g., a video send- 
er, in the network. This controller is responsible for man- 
aging the remotely stored streams, e.g., video files, and 
thereby making them available for on-demand access 
by the local stream controller module of the local DVMS 
manager. Client-server session management protocols 
control this access. The remote stream controller also 
provides a link for feedback control from the local DVMS 
manager to the remote DVMS manager, as described 
below. 

The remote stream I/O manager 86 also resides on 
a remote server; it is responsible for dynamically retriev- 
ing and storing streams from or to a storage container 
in the remote storage server. Efficient access to stored 
stream information and handling of file formats is pro- 
vided by this module. Thus, the remote stream I/O man- 
ager performs the same tasks as those performed by 
the steam I/O manager of the local DVMS manager in 
a stand-alone computer implementation tasks includ- 
ing translation between stored stream representations 
and corresponding dynamic computer-based token rep- 
resentations. 

The remote network stream I/O manager 88, imple- 
mented on a remote sen/er, regulates transmission of 
streams across the network to and from a local DVMS 
manager with which a communications session has 
been initiated. This transmission comprises stream ex- 
change between the remote network stream I/O man- 



ager 88 and the local network stream I/O manager 90, 
which r sides locally with respect to th local DVMS 
manager modul s, on a client in th network. Stream 
transport protocols control the transmissions. The local 
network stream I/O manager 90 receives streams from 
the network and delivers them to the local DVMS str am 
interpreter 28 during playback processes; conversely, it 
receives streams from the local stream interpreter and 
transmits them over the network during recording and 
storage processes. 

The DVMS of the invention provides protocols for 
managing the interaction and initialization of the local 
DVMS manager modules and the remote DVMS man- 
ager modules just described. Specificaity four classes 
of protocols are provided, namely, access protocols, for 
stream group naming and access from a stream server 
or injector; transport protocols, providing for str am 
read-ahead, and separation and prioritization of 
streams; injection/tap protocols, providing the capability 
to broadcast scheduled streams, e.g., video streams, to 
selected network clients; and feedback protocols, ac- 
commodating the management of adaptive computa- 
tional resources and communication bandwidths. 

When the DVMS is configured in a network environ- 
ment, remote media data stream file sen/ers in the net- 
work advertise the stream groups controlled in their do- 
main based on a standard network advertisement pro- 
tocol. For example, in the Novell® Netware™ environ- 
ment, servers advertise based on the Service Advertise- 
ment Protocol (SAP). Each video server is responsible 
for a name space of stream group containers that it ad- 
vertises. 

As shown in Fig. 9, when an application running on 
a computer (client) connected in the network opens a 
stream group container by name to access the container 
contents, the DVMS initializes the corresponding local 
stream controller 24 of the local DVMS manager to ac- 
cess the corresponding stream group. The local stream 
controller then sets up a client-server session with the 
appropriate remote stream controller 82 based on the 
stream group container name that the application wish- 
es to access and the remote server's advertisement. 
The local stream controller may access multiple stream 
group containers during a single session. This capability 
results from the name sen/ice architecture employed by 
the remote DVMS manager. In this scheme, a domain 
of container names is accessed via a single access call, 
whereby multiple containers in the domain are simulta- 
neously available for access. 

The local stream controller 24 then initializes th lo- 
cal network stream I/O manager 90 of the remote DVMS 
manager, and commences a stream read-ahead op r- 
ation, described below, with the appropriate remote 
stream controller 34. In turn, that remote stream control- 
ler initializes the corresponding remote stream I/O man- 
ager 86 and remote network stream I/O manager 88 to 
handle r trieval and transmission of the constituent 
streams within the accessed stream group. 
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The stream read ah ad peration is employed to 
reduce latency perceived by a client when a stream 
group presentation is begun; stream retrieval transmis- 
sion, and scaling require a finite amount o1 time and 
would be perceived by a client as a delay. In the read 
ahead operation, the remote stream I/O manager, the 
remote network stream I/O manager, and the local net- 
work stream I/O manager retrieve, transmit, and scale 
the streams at the very start of a client-server session, 
even before the client requests stream presentation. In 
this scheme, the streams are ready for immediate con- 
sumption by the local stream interpreter, via the stream 
pipes whenever a user specifies the start of presenta- 
tion, and possible presentation delays are thereby elim- 
inated or minimized. 

Referring now to Fig. 10. when a network client re- 
quests access to a specified stream group, the following 
procedure is implemented. Upon initialization from the 
request, and based on the network sen/ers' stream 
group advertisements, the appropriate remote stream 1/ 
O manager 86 retrieves stored streams, e.g.. audio and 
video streams, from the appropriate file storage 30 con- 
taining the requested stream group. The manager then 
separates the retrieved streams, if necessary, thereby 
producing separate audio and video presentation unit 
streams, and enqueues corresponding stream descrip- 
tor tokens in separate stream pipes 87. one pipe for 
each presentation unit token stream. 

The remote network stream I/O manager 88 con- 
sumes the presentation unit tokens from each of the 
stream pipes, assembles transmission packets based 
on the streams, and releases them for transmission 
across the network 80 directly to the corresponding local 
network stream I/O manager 90, based on the DVMS 
stream data transport protocols; the particular transport 
protocol used is set by the network environment. For ex- 
ample, in a NovellCg) network, the Netware SPX protocol 
is used for stream data transport. The local network 
stream I/O manager 90, upon receipt of the transmitted 
presentation units, queues the presentation units in sep- 
arate stream pipes 32 for each stream to be consumed 
by the local stream interpreter 28 for use by the client 
computer's digital media hardware subsystem 34. 

Referring to Fig. 11 A. illustrating the remote DVMS 
functions in more detail, upon initialization, the remote 
stream controller 84 initializes the remote stream I/O 
manager 86 and the remote network stream I/O man- 
ager 86 by creating 1 30, 1 36 active modules of each of 
the managers. It also specifies 132 the requested 
stream group for access by the two managers. Control 
1 34 of the specified stream group is provided throughout 
the duration of the managers' functions. 

The remote stream controller 84 also provides man- 
agement 138 of the client/server session which pro- 
ceeds between the local and remote DVMS systems as 
a result of the stream group request. Based on informa- 
tion provided by the local DVMS manager which re- 
quested the stream group, the remote stream controller 



receives 140 a desired rate value from the local DVMS: 
this rate value indcates the rat at which the streams 
are to b presented, and is xpiainedmor fully below 
The remote stream controller specifies 142 this rat to 
5 ach of th remote str am I/O manager 86 and the re- 
mote network stream I/O manager 88. which each re- 
ceive 144 the rate. 

. The remote stream I/O manager 36 retrieves, sep- 
arates, and scales 146 audio and video streams from 
10 the appropriate stream container 30. If the streams were 
stored separately, rather than interleaved, the streams 
may be individually scaled at this point, while if the 
streams were interleaved, the remote network stream 1/ 
O manager 88 later scales the streams, as explained in 

15 detail below. 

In a process explained previously with reference to 
Fig 7, the remote stream I/O manager creates 148 
stream tokens correspondingtothe stream presentation 
unit frames retrieved from storage, and enqueues 150 
20 the stream tokens for delivery to the remote network 
stream I/O rmnager via individual stream pipes 32. 

The remote network stream I/O manager 88 de- 
queues 1 52 the tokens from the stream pipes and if nec- 
essary, scales 1 54 the tokens. The tokens are then f or- 
2S matted 156 for transmission packets, and released to 
the network tor transmission. 

Referring also to Fig. 1 2. the packet format process 
156 is implemented as follows. Each token 114 in th 
token streams 112 is enqueued in a buffer 118. whereby 
30 each buffer contains tokens and associated media 
frame data from one stream only, even it the streams 
were originally interleaved in storage. Tokens, along 
with corresponding media data from the buffers, are 
then sequentially ordered in packets 120 in such a man- 
35 ner that each token and the corresponding media data 
remain associated. This association, along with the fact 
that tokens are likely to be time stamped, does not re- 
quire that the storage format and congruency of the 
stream be preserved in the transmission packets during 
40 transmission. 

This packet format scheme provides dramatic ad- 
vantages over the conventional packet format scheme 
of the prior art. In the conventional packet protocol the 
stored media data format, which is typically interleaved, 
45 is preserved in the transmission packet format. Thus, in 
this scheme, audio and video streams are transmitted 
across a network in packets containing a sequence of 
interleaved headers, audio frames, and video frames, 
and thus, the specific syntax by which the interleaved 
so streams were stored is replicated in the packet format. 
In contrast, in the packet format scheme of the in- 
vention, the separation of streams and distinctly f omnat- 
ting of packets for each stream provides an opportunity 
and the facility to examine, process, and make trans- 
55 mission decisions about each stream and each presen- 
tation unit independent of other streams and presenta- 
tion units. As a result, the local DVMS manager can 
make presentation decisions about a given presentation 
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unit tok n independent of the other tokens in the co^e- 
sponding stream, and can make those decisions "on- 
the-fly". This capability provides for real time scaling and 
network toad adjustment as a stream is r trieved, proc- 
essed, and transmitted across th network. The conven- 5 
tional prior art scheme does not have any analogous fa- 
cility, and thus cannot provide the synchronization, scal- 
ing, and rate control features of the Invention. 

Referring to Fig. 11B, once the stream group is 
transmitted across the network, the local DVf^^S man- io 
ager processes the stream group for presentation. The 
local stream controller 24 manages 1 53 the client/server 
session communication with the remote stream control- 
ler 84. Like the remote stream controller, it also creates 
160, 162 instances of active processors, here initializing ^5 
the local network stream I/O manager 90 and the local 
stream interpreter 23. The local stream controller cre- 
ates 164 the stream grouping of interest and controls 
1 66 that group as the local network stream I/O manager 
90 and stream interpreter 29 process the group. 20 

The local network stream I/O manager 90 receives 
168 the transmitted network packets and assembles 
presentation units as they are received. Then it creates 
170 stream tokens from the received packets and en- 
queues 1 72 them to individual stream pipes. The stream 2S 
interpreter 26 dequeues 1 76 the tokens from the stream 
pipes and scales 176 the tokens as required, in a proc- 
ess discussed below. Then using the synchronization 
schemes explained previously, the streams are syn- 
chronized 173 and sent to the digital hardware subsys- 30 
tern for presentation. The functions of this hardware 
were explained previously with reference to Fig. 8. 

In the reverse process, i.e., when recording streams 
from a network client for storage on a remote stream 
server, as shown in Figs. 1 1 A and 1 1 B, the digital stream 3S 
hardware subsystem provides to the local stream inter- 
preter 28 the stream data, and based on the playing for- 
mat of the streams, the bcal stream interpreter gener- 
ates 1 80 corresponding time stamps, for use in synchro- 
nization and scaling. Stream tokens are then created 40 
182 and enqueued 184 in the stream pipes. 

The local network stream I/O manager dequeues 
186 the stream tokens from the pipes and scales 168 
the streams based on their play rate, record rate, and 
storage format, as discussed below. Then packets are 
formed and transmitted 1 90 via the network to the re- 
mote server location on which the corresponding remote 
DVMS exists. 

Thereafter, the remote network stream I/O manager 
88 receives 1 92 the transmitted packets and creates so 
194 stream tokens based on the packets. The tokens 
are then enqueued 1 96 in stream pipes for consumption 
by the remote stream I/O manager. The remote stream 
I/O manager dequeues 198 the tokens from the stream 
pipes, and scales 200 the streams if necessary. Finally, ss 
it interleaves the streams, if they are to be stored in an 
interleaved format, and stores 202 the streams in ap- 
propriate stream containers on the server. 
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Figures liA and 11 B illustrate that the network im- 
pi mentation f the DVMS of the invention is an elegant 
and fficient extension of the stand-alone DVMS imple- 
mentation; this extension is possible as a result of the 
modularity in design of each processing entity. Specifi- 
cally, the details of packet transport are transparent to 
the remote stream I/O manager; it functions in the same 
manner as a stand-alone stream I/O manager. Similarly, 
presentation unit token streams provided to the local 
stream interpreter do not contain transmission-specific 
formats. 

As a result, the local DVMS manager, when imple- 
mented in a network environment, is easily reconfigured 
to provide a remote DVMS manager which includes a 
corresponding remote steam I/O manager, with the ad- 
dition of a remote network stream I/O manager; and a 
local DVMS manager which includes a corresponding 
Iggs! stream interpreter, and a local network stream I/O 
manager from the remote DVMS manager. Expioiting 
this modularity, programming applications may be cre- 
ated which are supported by the DVMS functionality 
without them perceiving a functional difference between 
a local, stand-alone type stream scenario and a remote, 
network stream scenario. 

Additionally, as will be recognized by those skilled 
in the art, these processes may alternatively be imple- 
mented in hardware using standard design techniques 
to provide the identical functionality. 

Scalable Stream Rate Control 

In the network embodiment of the DVMS of the in- 
vention, the remote and local DVMS managers operate 
together to provide control of the rate of flow of streams 
through a network during stream transmission. As men- 
tioned above, this capability is particularly advanta- 
geous in handling audio and video streams to accom- 
modate fluctuations in network utility availability by pri- 
oritizing audio stream rate over video stream rate. 

This priority is based on the premise that human vis- 
ual perception of motion is highly tolerant of variations 
in the displayed quality and frame rate of presented vid- 
eo. Typically, humans perceive motion when a video 
presentation rate exceeds at least 1 5 frames per sec- 
ond. Moreover, instantaneous and smooth variations in 
video presentation rates are practically unnoticeable. 
However, hunr^ aural perception is quite intolerant of 
variations in audio presentation quality or rate. Typically, 
hunnans perceive noise when a constant audio presen- 
tation rate is not maintained, and perceive ■clk;ks" when 
brief periods of silence are injected into an audio stream. 
Thus, the DVMS system prioritizes audio streams over 
video streams. This prioritization of audio over video x- 
tends over the entire data flow of audio and video 
streams in a network, starting from their retrieval from 
storage containers and ending with their presentation. 

Control of the rate of streams through a network 
based on this audio prioritization scheme may be initi- 
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ated automatically, or in respons to a direct user re- 
quest Each type of control request is discussed below 
in turn. The remote DVMS manager responds to each 
type in the same manner, however. 

R ferring again to Fig. 11A, remote stream control- 
lers 84 in the network are responsible for instructing the 
corresponding remote stream I/O manager 86 and re- 
mote network stream I/O manager 38 as to what per- 
centage of the nominal presentation rate (at which the 
stream would -normally be presented) the stream 
should be actually retrieved and transmitted. The re- 
mote stream controller receives I40thedesired rate val- 
ue via network communication with the local stream 
controller 24 and specifies 142 this rate to the remote 
stream I/O manager 86 and the remote network stream 
I/O manager 88, which each receive 1 44 the rate value^ 
The stream rate control mechanism is earned out 
by either the remote stream I/O manager or the remote 
network stream I/O manager, depending on particular 
stream access scenarios. As explained above. If the re- 
quested audio and video streams are interleaved in stor- 
age in, e.g., the Intel DVI AVSS file fomnat. the remote 
stream I/O manager retrieves the streams in that inter- 
leaved form, separates the streams into distinct 
streams and creates corresponding presentation unit 
tokens. The remote stream I/O manager does not, in this 
scenario, have the ability to manipulate the streams dis- 
tinctly because they are retrieved interleaved. In this 
case the remote network stream I/O manager, which 
obtainsthe streams from the stream pipe afterthey have 
been separated, controls the rate of each stream as be- 
fore forming stream packets for network transmission. 

If the streams to be retrieved are individually stored, 
the remote stream I/O manager may control the rate of 
the streams as they are each separately retrieved and 
corresponding tokens are created. In this case, the rate 
control functionality of the remote network stream I/O 
manager is redundant and does not further change the 
stream rate before the stream is transmitted across the 

network. . 

Rate control of noninterleaved streams is provided 
by the remote stream I/O manager during the scaling 
process 146. in which case the remote stream I/O man- 
ager retrieves stream frames from the storage container 
while skipping over appropriate stream frames to 
achieve the prespecifed stream rate. The streams 
frames which are skipped over are determined based 
on the particular compressbn technology that was ap- 
plied to the stream. The remote stream I/O manager 
substitutes virtual presentation units for the skipped 
stream frames to maintain sequential continuity of the 
stream. 

As explained previously regarding flow control syn- 
chronization schemes, a virtual presentation unit com- 
prises a presentation unit with some amount of substi- 
tute media data infomiation for maintaining a consistent 
internal state of stream unit sequence, ev n while a next 
sequential unit is unavailable. Here in the case of scal- 



ing where virtual units are employed to scale the trans- 
mission rate of str ams. virtual units ar additionally em- 
pkjyed t reduce the amount of presentation unit data 
that is transmitted. 
s Accordingly, here a virtual vdeo presentation unit 
comprises a hull presentation unit, having a specified 
presentation duration andtime. or a time stamp, but not 
containing any frame presentation infomiation. Then, 
when the remote stream I/O manager substitutes a vir- 
10 tual presentation unit (or a skipped stream frame, a 
transmission packet including the virtual presentation 
unit is shorter and more quickly transmitted than it would 
be If the skipped frame was included. When the local 
steam interpreter and digital presentation subsystem re- 
ts ceive and process the null video unit, they interpret that 
unit as an instmction to represent the most recently pre- 
sented frame. In this way. the presentation subsystem 
maintains default video presentation data without re- 
quiring that data to be received via a network transmis- 

20 sion. . 

As will be recognized by those skilled in the art of 
compression technology, it is alternatively possible, us- 
ing appropriate compression techniques, to substitute 
partial media information, rather than null infomiation to 
25 increase or decrease the transmission rate of presenta- 
tion streams containing presentation units that will not 
be presented. 

Rate control of interleaved streams is provided by 
the remote network stream I/O manager upon receipt of 
30 the stream tokens form the stream pipes. Here, the re- 
mote network stream I/O manager scales 154 the 
stream tokens as they are processed to form transmittal 
packets. This is accomplished by processing the stream 
in a scheme whereby the remote network stream I/O 
35 manager skips over appropriate tokens and substitutes 
virtual presentation unit tokens in their place, depending 
on the compression technotogy used, to achieve the 
specified stream rate. 

In this common and important situation of inler- 
40 leaved stream storage, the remote network stream I/O 
manager participates in stream data flow and thus may 
be characterized with a particular process cycle and 
process period. During each of its process cycles, the 
remote network stream I/O manager processes a single 
45 presentation unit and determines if the next sequential 
presentation unit is to be transmitted based on a trans- 
mit decision scheme. Like the process decision 
schemes described above in connection with synchro- 
nization techniques, the transmit decision scheme is im- 
so plemented based on the timing technique of the stream 
being processed; if the stream presentation units in- 
clude embedded time stamps, then the transmit deci- 
sion scheme is based on an explicit timing count, while 
implicit timing counting is employed othenwise. 
55 No matter which agent provides the scaling (unc- 
tion, only video streams are scaled, while audio str am 
presentation frames and tokens are processed at the 
full nominal presentation rate, without skipping any au- 
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dio presentation frames; this presen^tion of audio pres- 
entation rate inherently prioritizes audio streams ver 
video streams. 

The scaling function is, as explained above, de- 
pendent on the compression technology employed for 
a particular frame or stream group. Using, e.g., a key 
frame-based compression technique, a key frame is an 
Independently selectable frame within a stream that 
contains information required for decompression of ail 
the following non-key frames dependent on that key 
frame. Dependent, or non-key, frames are not inde- 
pendently selectable. The motion JPEG format relies on 
a scheme in which every frame in a stream is a key 
frame. During the scaling operation, only key frames are 
skipped over, whereby all non-key frames associated 
with the skipped key frame are also skipped over. Null 
frames are then substituted for the key frame and all of 
its corresponding non-key frames. 

An implicit timing rate control scheme and an ex- 
plicit timing rate control scheme may be devised. Like 
the synchronization techniques described previously, 
the implicit rate control scheme is based on a counting 
technique and does not require embedded time codes 
on the stream presentation frames. The explicit rate 
control scheme is based on the use of time stamps for 
explicitly determining the presentation and duration time 
of a given frame. In either implementation, virtual pres- 
entation units are generated to accommodate skipped 
stream frames. 

In addition, in either implementation, when skipped 
stream frames later become available, they are identi- 
fied and skipped over, thereby being deleted, rather than 
presented. This presentation unit deletion function, like 
that employed in the synchronization schemes, main- 
tains a current sequential stream progression. 

Adaptive Load Balancing 

The DVMS of the invention includes the ability to 
automatically and dynamically sense the load of a pack- 
et network in which the system is implemented. Based 
on the sensed loading, the stream rate control mecha- 
nism described above is employed by the system to cor- 
respondingly and adaptively balance the load within the 
network, thereby optimizing the network utility availabil- 
ity. 

Referring to Fig. 11 B, in the this load balancing 
scheme, the kxial network stream I/O manager 90 mon- 
itors 206 the stream pipes 32 currently transmitting 
streams between that manager and the local stream in- 
terpreter 26 for variations in the average queue size, i. 
e., availability of presentation unit tokens, of each pipe. 
When the average queue size varies significantly the 
local network stream I/O manager detects the direction 
of the change, i.e., larger or smaller. Thereafter it noti- 
fies 208 the local stream controller 24 of the change and 
requests a new stream presentation token rate to b 
transmitted as a percentage of the nominal presentation 



rate, based on the change. In turn, th local stream con- 
troller transmits the request to the remote stream con- 
troller 34. which in response: instructs the remote 
stream I/O manager 66 and the remote network stream 

s I/O manager 66 to adjust the stream presentation unit 
rate to th r quested rate. 

The requested rate is based on the average queue 
size in the following scheme. When the queue siz in- 
creases significantly above a prespecified upper avail- 

10 ability, the requested rate is increased; the increased 
availability indicates that high-speed processing may be 
accommodated. Conversely, when the queue size de- 
creases significantly below a prespecified lower availa- 
bility, the requested rate is decreased; the decreased 

IS availability indicates that the current rate cannot be ac- 
commodated and that a lower rate is preferable. 

Alternatively, a user may specify a desired str am 
presentation rate, that specification being accepted 204 
by the iocai stream controller 24. Ir.-turn. the-locMstream 

20 controller sends the request to the remote stream con- 
troller for implementation. 

In the corresponding reverse process, in which 
stream frames are stored after being recorded via the 
local DVMS manager the remote stream I/O manager 

25 scales 200 the stream before storage to reconstruct the 
stream such that it no longer includes null frames. This 
function may also be accomplished by the local network 
stream I/O manager in a scaling process 1 66 completed 
before a stream is transmitted. 

30 The DVMS of the invention has been described with 
particular detail relating to a preferred embodiment. Oth- 
er embodiments are intended to fall within the scope of 
the invention. For example, while the DVMS of the in- 
vention has been described in a scheme for managing 

35 audio and video streams, other media data stream 
types, e.g., stills, accessed from various media data ac- 
cess points, e.g., a PBX server, are within the scope of 
the claims. If the DVMS is implemented on a computer 
system or network in software, programming languages 

40 other than the C programming language may be em- 
ployed, as will be clear to those skilled in the art of pro- 
gramming. Alternatively, the DVMS may be implement- 
ed entirely in hardware using standard digital design 
techniques, as will also be clear to those skilled in the 

45 art of digital hardware design. 



Claims 

50 1. A computer-based media data processor for con- 
trolling the timing of computer processing of digi- 
tized continuous time-based media data composed 
of a sequence of presentation units, each unit hav- 
ing a prespecified presentation duration during a 

55 computer presentation of the media data, the m dia 
data processor comprising: 

a reference clock which indicates a start time 
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of presentation processing of the media data 
pr sentation units and which maintains a cur- 
rent pres ntation time as the media data pres- 
entation unit sequenc is processed for pres- 
entation; s 
a counter f r counting each pres ntation unit in 
the presentation unit sequence after that pres- 
entation unit is processed for presentation to 
maintain a current presentation unit count; and 
a comparator connected to the reference clock io 
and the counter, and programmed with the pre- 
specified presentation duration, the compara- 
tor comparing a product of the presentation du- 
ration and the current presentation unit count, 
specified by the counter, with the current pres- 
entation time, specified by the reference clock, 
after each presentation unit is processed for 
presentation and based on the comparison, re- 
leasing a next sequential presentation unit to 
be processed for presentation when the prod- 20 
uct matches the current presentation time 
count, and deleting a next sequential presenta- 
tion descriptor in that sequence when the prod- 
uct exceeds the current presentation time 
count. 25 

The media data processor of claim 1 further com- 
prising a flow controller, connected to said compa- 
rator, for receiving an indication from the compara- 
tor that a presentation unit should be released for 30 
presentation, determining availability of a next pres- 
entation unit in the presentation unit sequence to 
be processed, and based on that availability, gen- 
erating and releasing a virtual presentation unit of 
the prespecified presentation duration to be pre- 3S 
sented as a default presentation unit in place of a 
next presentation unit when a next presentation unit 
is not available and until the next presentation unit 
is available. 

40 

The media data processor of claim 2 wherein the 
flow controller is adapted to monitor and identify a 
previously unavailable presentation unit when that 
unit is later available and in response to identifica- 
tion of the later available unit, withholding the unit 45 
from release from presentation, whereby the later 
available unit is not presented. 

The media data processor of claim 1 wherein: 

so 

the clock is adapted to indicate a start time of 
presentation processing of a plurality of media 
data presentation unit sequences, the start time 
being common to the plurality of sequences, 
and which maintains a current presentation ss 
time as the media data sequences are proc- 
essed for presentation; 

the counter counts each presentation unit in the 



plurality of presentation unit sequences after 
that presentation unit is processed for presen- 
tation to nr^intain a distinct current pr sentation 
unit count for each presentation unit sequence; 
and 

the comparator is connected to th reference 
clock and the counter, and programmed with 
the prespecified presentation duration, the 
comparator comparing for each of the plurality 
of presentation unit sequences a product of the 
presentation unit duration and the current pres- 
entation unit count of that sequence, specified 
by the counter, with the current presentation 
time, specified by the reference clock, after 
each presentation unit from that sequence is 
processed for presentation, and based on the 
comparison, releasing a next sequential pres- 
entation unit in that presentation unit sequence 
to be processed for presentation when the 
product matches the current presentation time 
count, and deleting a next sequential presenta- 
tion unit tn that presentation unit sequence 
when the product exceeds the current presen- 
tation time count, whereby the plurality of media 
data presentation unit sequences are synchro- 
nously processed for substantially simultane- 
ous synchronous presentation. 

5. A computer-based media data processor for con- 
trolling the computer presentation of digitized con- 
tinuous time-based media data composed of a se- 
quence of presentation units, each unit having a 
prespecified presentation duration and presenta- 
tion time during a computer presentation of the me- 
dia data and further characterized as a distinct me- 
dia data type, the media data processor comprising: 

a media data input manager for retrieving me- 
dia data from a corresponding media data ac- 
cess location in response to a request for com- 
puter presentation of specified presentation 
unit sequences, determining the media data 
type of each presentation unit in the retrieved 
media data, designating each retrieved presen- 
tation unit to a specified media data presenta- 
tion unit sequence based on the media data 
type determination forthe presentation unit, as- 
sembling a sequence of presentation descrip- 
tors for each of the specified presentation unit 
sequences, each presentation descriptor com- 
prising media data for one designated presen- 
tation unit in that sequence, all presentation de- 
scriptors in an assembled sequence being of a 
common media data type, and linking the pres- 
entation descriptors in each assembled se- 
quence to establish a progression of presenta- 
tion units in each of the sequences; and 
a media data interpreter, connected to th me- 
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dia data input manager, for accepting from th 
media data input manager the assembled pres- 
entation descriptor sequ nces one descriptor 
at a time and releasing the s quences for pres- 
entation one presentation unit at a tim , indi- ^ 
eating a start time of presentation processing 
o1 the presentation unit sequences, maintaining 
a current presentation time as the sequences 
are processed for presentation, counting each 
unit in the sequences after that unit is released 10 
to be processed for presentation, to maintain a 
distinct current presentation unit count for each 
sequence, comparing for each of the presenta- 
tion unit sequences a product of the presenta- 
tion duration and the current presentation unit 
count of that sequence with the currently main- 
tained presentation time after each unit from 
that sequence is processed for presentation, 
and based on the comparison, releasing for 
presentation processing a next sequential 20 
presentation unit in that sequence when the 
product matches the currently maintained pres- 
entation time count, and deleting a next se- 
quential presentation unit in that sequence 
when the product exceeds the currently main- 2S 
tained presentation time count 

6. The media data processor of claim 5 further com- 
prising a presentation unit sequence controller for 
initiating the media data input manager and the me- 30 
dia data interpreter, specifying to the media data in- 
put manager and the media data interpreter the 
presentation unit sequences to be presented, and 
controlling starting and stopping of sequence pres- 
entation in response to user specification. 35 

7. The media data processor of claim 5 wherein the 
media data retrieved by the media data input man- 
ager comprises a storage presentation unit se- 
quence composed of presentation units for the 40 
specified presentation unit sequences, presenta- 
tion units of the specified presentation unit se- 
quences being alternately interleaved to compose 

the storage presentation unit sequence. 

45 

8. The media data processor of claim 5 wherein the 
media data retheved by the media data input man- 
ager comprises a plurality of storage presentation 
unit sequences, each storage presentation unit se- 
quence composed of presentation units for a spec- so 
ified presentation unit sequence and all presenta- 
tion units in a storage presentation unit sequence 
being of a common media data type. 

9. The media data processor of claim 6 wherein the ss 
retrieved media data presentation units are encod- 
ed in a storage code and compressed, and further 
comprising a presentation system for decoding the 
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presentation units, decompressing the presentation 
units, and converting the digitized presentation 
units t a corresponding analog repres ntation for 
presentation. 

10. The media data processor of claim 5 wherein the 
media data interpreter maintains the current pr s- 
entation time at a prespecified time rate such that 
presentation units of the specified presentation se- 
quences are each presented for a presentation du- 
ration different than the prespecified presentation 
duration. 

11. The media data processor of claim 5 wherein the 
media data interpreter monitors for each specified 
presentation unit sequence an actual presentation 
rate of the presentation descriptors in that se- 
quenceTelsased for presentation, compares the ac- 
tual presentation rate with a prespecified nominal 
presentation rate, and based on the comparison, 
generates and releases a virtual presentation unit 
of the prespecified presentation duration to be pre- 
sented as a default presentation unit each time the 
monitored presentation rate is greater than the pre- 
specified presentation rate, and based on the com- 
parison, skips over a presentation unit each time the 
monitored presentation rate is less than the pre- 
specified presentation rate. 

12. A computer-based media data processor for con- 
trolling transmission of digitized media data in a 
packet switching network, the media data compris- 
ing a sequence of continuous time-based presen- 
tation units, each unit having a prespecified pres- 
entation duration and presentation time during a 
computer presentation of the media data and fur- 
ther being a distinct media data type, the network 
comprising a plurality of client computer processing 
nodes interconnected via packet-based data distri- 
bution channels, the media data processor com- 
prising: 

a remote media data controller for receiving 
from a client processing node a request for 
presentation of specified presentation unit se- 
quences: 

a remote media data input manager for receiv- 
ing from the media data controller an indication 
of the specified presentation unit sequences, 
and in response to the request, retrieving media 
data from a corresponding media access bca- 
tion, detemiining the media data type of each 
presentation unit in the retrieved media data, 
designating each retrieved presentation unit to 
a specified media data presentation unit se- 
quence based on the media data type detenni- 
nation for the presentation unit, assembling a 
sequence of presentation descriptors for each 
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o1 the specified presentation unit sequences, 
each descriptor comprising nnedia data for one 
designated presentation unit in that sequ nee. 
all presentation descriptors in an assembled 
sequence being of a common media data type, 5 
and linking the descriptors in each assembled 
sequence to establish a progression of presen- 
tation units in each o1 the specified presentation 
unit sequences; 

a remote network media data manager con- 
nected to the remote media data input manag- 
er, for accepting from the remote media data 
input manager the assembled specified pres- 
entation descriptor sequences one presenta- 
tion descriptor at a time, assembling transmis- ^5 
sion presentation unit packets each composed 
o1 at least a portion of a presentation descriptor 
and its media data, all presentation descriptors 
and media data in an assembled packet being 
of a common media data type, and releasing 20 
the assembled packets for transmission via the 
network to the client processing node request- 
ing presentation of the specified presentation 
unit sequences; 

a local media data controller for transmitting the 2S 1 3. 
request for presentation of specified presenta- 
tion unit sequences from the client processing 
node to the remote media data input controller 
via the network and controlling starting and 
stopping of sequence presentation in response 30 
to user specifications; ^ ^* 

a local network media data input manager for 
receiving from the local media data controller 
an indication of the specified presentation unit 
sequences, receiving the transmission presen- 35 
tation unit packets transmitted from the remote 1 5. 
network media data manager via the network, 
designating a presentation unit sequence for 
each presentation descriptor and media data in 
the received packets to thereby assemble the 40 
presentation descriptor sequences each corre- 
sponding to one specified presentation unit se- 
quence, all presentation descriptors and media 1 6, 
data in an assembled sequence being 0I a com- 
mon media data type, and linking the descrip- ^5 
tors in each assembled sequence to establish 
a progression of presentation units for each of 
the presentation unit sequences; and 
a local media data interpreter connected to the 
local network media data input manager, for ac- 50 
cepting the assembled presentation descriptor 
sequences one descriptor at a time and releas- 
ing the sequences for presentation one unit at 
a time, indicating a start time of presentation 
processing of the sequences, maintaining a 55 
current presentation time as the descriptor se- 
quences are processed for presentation, and 
based on the presentation duration of each 



presentation unit, synchronizing presentation 
t th specified pr sentation unit sequences 
with the cun-ent presentation time, wherein the 
local media data interpreter synchronizes pr s- 
entation of the specified presentation unit se- 
quences by counting each presentation unit in 
the sequences after that presentation unit is re- 
leased to be processed for presentation t 
maintain a distinct current presentation unit 
count for each sequence, comparing for each 
of the presentaton unit sequences a product of 
the presentation duration and the current pres- 
entation unit count of that sequence with the 
maintained presentation time after a presenta- 
tion unit from that sequence is released to be 
processed for presentation, and based on the 
comparison, releasing a next sequential pres- 
entation unit in that presentation unit sequence 
when the product matches the currently main- 
tained presentation time, and deleting a n xt 
sequential presentation unit in that presenta- 
tion unit sequence when the product exceeds 
the currently maintained presentation time. 

The media data processor of claim 8 or 12 wherein 
at least one media data input manager associates 
each presentation descriptor with a corresponding 
presentation duration and presentation time, based 
on the retrieved media data. 

The media data processor of claim 1 . 5 or 1 2 where- 
in the media data presentation unit sequence com- 
prises a video frame sequence including a plurality 
of intracoded video frames. 

The media data processor of claim 1 4 wherein each 
of the plurality of intracoded video frames compris- 
es a key frame and is followed by a plurality of cor- 
responding non-keyframes, each keyframe includ- 
ing media data information required for presenta- 
tion of the following corresponding non-key frames. 

The media data processor of claim 12 wherein the 
local media data interpreter synchronizes presen- 
tation of the specified presentation unit sequences 
by comparing for each of the presentation descrip- 
tors in each of the presentation descriptor sequ nc- 
es the presentation time corresponding to that de- 
scriptor with the currently maintained presentation 
time, and based on the comparison, releasing a 
next sequential presentation unit to be processed 
for presentation when the corresponding presenta- 
tion time of that descriptor matches the current 
presentation time, and deleting a next sequential 
presentation unit to be processed for presentation 
when the current presentation time exceeds the 
corresponding presentation time of that descriptor. 
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